Rotation Invariant Sparse Coding and Pca
نویسندگان
چکیده
We attempt to encode an image in a fashion that is only weakly dependent on rotation of objects within the image, as an expansion of Roger Grosse’s work on Translation Invariant sparse coding. Our approach is to specify only a small set of basis images, from which some reasonably large number rotated bases are calculated. The image is trained to this set of rotated bases, so that ultimately the spare code representation is given in terms of rotations of a small set of vectors. We develop an image representation scheme and several algorithms suited to this problem, particularly gradient descent methods for sparse coding and an investigation of rotation invariant Principal Components Analysis. Our experimental results suggest that PCA is significantly more fruitful, unless better algorithms for the sparse coding side can be developed in the future. 1. Motivation We are motivated ultimately by image recognition and classification. All existing algorithms are overly sensitive to rotations, and often translations, of the image, which can drastically alter the way it is encoded and thus the way an algorithm might classify it. This research is aimed at producing an image coding method in which even rotations of individual objects in the image will not drastically alter the image representation. As an example, an image might contain several objects, each of which can rotate freely. We develop a system in which all possible rotations of these objects would be coded nearly identically (differing only in the index of some coefficients), in a manner that is not predisposed to any orientation of the overall image, or indeed the orientation of any subportion of it. 2. Rotation Formulation and Image Construction The base for both our Sparse Coding and PCA attempts is a robust formalism for representing the image in terms of rotated bases. The building blocks of our image representation are circular image patches. A parameter n, the radius of the patches, is given, and each patch is represented in two forms: as a 2n + 1 × 2n + 1 matrix, where those entries of distance greater than n from the center are ignored, and as a column vector listing all the elements within distance n of the center in an arbitrary order. The vector representation allows rotations to be defined simply as linear transformations of vectors. In particular, before construction begins, we define the number of rotations r, and rotation matrices T1, . . . , Tr are calculated to represent rotating a vector by 2πi r radians. Note that while ideally these would form a group under multiplication, there is some distortion due to fitting images into pixels, so we generally compute each Ti individually rather than composing them with each other. Given a basis set B of b vectors, a radius n, and a number of rotations r, we use the following procedure to construct the image from these bases. The image specification is given by a b× r ×W ×H matrix S, where W,H are the dimensions of the image; S gives the weights of each possible rotation of each basis, centered on each point in the image. We first construct the intermediate matrix Z, which is W ×H ×m: Date: 14 December 2006. 1 2 NATHAN PFLUEGER, RYAN TIMMONS
منابع مشابه
Airplane detection based on rotation invariant and sparse coding in remote sensing images
Airplane detection has been taking a great interest to researchers in the remote sensing filed. In this paper, we propose a new approach on feature extraction for airplane detection based on sparse coding in high resolution optical remote sensing images. However, direction of airplane in images brings difficulty on feature extraction. We focus on the airplane feature possessing rotation invaria...
متن کاملSubgraphs Matching-Based Side Information Generation for Distributed Multiview Video Coding
We adopt constrained relaxation for distributed multiview video coding (DMVC). The novel framework integrates the graphbased segmentation and matching to generate interview correlated side information without knowing the camera parameters, inspired by subgraph semantics and sparse decomposition of high-dimensional scale invariant feature data. The sparse data as a good hypothesis space aim for ...
متن کاملFlip-invariant Video Copy Detection Using Sparse-coded Features
Now a days, a number of videos are available in video databases, social networking sites and other web servers. Large size of these video database make it difficult to trace the video content. To ensure the copy-right of the videos in video database, a video copy detection system is needed. A Video copy detection system stores the video features that characterize a video along with the video in...
متن کاملFRIST - Flipping and Rotation Invariant Sparsifying Transform Learning and Applications
Features based on sparse representation, especially using the synthesis dictionary model, have been heavily exploited in signal processing and computer vision. However, synthesis dictionary learning typically involves NP-hard sparse coding and expensive learning steps. Recently, sparsifying transform learning received interest for its cheap computation and its optimal updates in the alternating...
متن کاملFace Recognition using an Affine Sparse Coding approach
Sparse coding is an unsupervised method which learns a set of over-complete bases to represent data such as image and video. Sparse coding has increasing attraction for image classification applications in recent years. But in the cases where we have some similar images from different classes, such as face recognition applications, different images may be classified into the same class, and hen...
متن کامل